Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ishermitian and issym test for approx. symmetry for floats. fix #10298 #10369

Closed
wants to merge 1 commit into from

Conversation

dhoegh
Copy link
Contributor

@dhoegh dhoegh commented Mar 1, 2015

This fixes JuliaLang/LinearAlgebra.jl#182.
I do not know if the atol,rtol keywords should be used on isapprox and how they should be determined for the matrix in issym and ishermitian.
@andreasnoack would you give this a thorough review, since I am not familiar with this part of base.

@stevengj
Copy link
Member

stevengj commented Mar 1, 2015

@dhoegh, !isapprox(A[i,j], transpose(A[j,i])) does not seem right to me, because the default atol in isapprox makes no sense (because it is not scaled by the overall norm of A). Not to mention that the default tolerances in isapprox are huge.

In particular, consider A = [1 1e-16; 2e-16 1]. Then your issym(A) will (sensibly) return true, but= issym(A * 1e16) will (nonsensically) return false. Conversely, for A = [1e-16 1e-16; 2e-16 1e-16], your function will (incorrectly) return issym(A) == true.

(In general, I think having a default atol != 0 in isapprox is an open invitation to problems of this sort. You can only check absolute tolerances in floating point if you know an absolute scale.)

@dhoegh
Copy link
Contributor Author

dhoegh commented Mar 1, 2015

Ok, kind of already knew that the isapprox without a proper chosen atol,rtol but it was not obvious to me how they could be chosen.

@andreasnoack
Copy link
Member

Thanks for working on this. As @stevengj points out, the metric for this should take the whole matrix into account instead of comparing element by element. I think something along the lines of the @stevengj proposed, e.g. norm(A - A', Inf)<tol*norm(A,Inf). Notice that norm(A+A',Inf) = norm(tril(A+A'),1) and it would be great if the calculation is done without allocation.

@dhoegh
Copy link
Contributor Author

dhoegh commented Mar 2, 2015

Ok, I will work towards to do the calculation without allocation. One issue though, I can't seem to replicate norm(A+A',Inf) = norm(tril(A+A'),1)

julia> norm(A + A',Inf)
218.15993483141798

julia> norm(tril(A + A'),1)
201.2828137897286

@andreasnoack
Copy link
Member

No. It's not true. I don't know why I got that wrong.

2015-03-02 14:43 GMT-05:00 Daniel Høegh [email protected]:

Ok, I will work towards to do the calculation without allocation. One
issue though, I can't seem to replicate norm(A+A',Inf) =
norm(tril(A+A'),1)

julia> norm(A + A',Inf)
218.15993483141798

julia> norm(tril(A + A'),1)
201.2828137897286


Reply to this email directly or view it on GitHub
#10369 (comment).

@dhoegh dhoegh force-pushed the issym_ishermitian branch from c612fff to 9739ab6 Compare March 8, 2015 09:47
@dhoegh
Copy link
Contributor Author

dhoegh commented Mar 8, 2015

I have updated the pull where issym/hermitian is based on norm(A - A.', Inf)<tol*norm(A,Inf)/norm(A - A', Inf)<tol*norm(A,Inf). I have also added docs, please correct them since I am not a native speaker. 3bee50c is only introduced to make issym/ishermitian faster when tol=0. is requested.

The last thing missing in this pull is probably better test cases, do anyone have a good idea for a test case?

@stevengj
Copy link
Member

stevengj commented Mar 8, 2015

This still seems like a can of worms to determine what tol and what norm are appropriate. (1e-10 seems pretty big.)

@dhoegh
Copy link
Contributor Author

dhoegh commented Mar 8, 2015

The failures are due to tol for Float32 is too large. tol could be determined dependent on input type as tol=eps(T)*10^2.

If we abandon to test ishermitian/issym I think we should remove the high level isposdef in

isposdef!(A::StridedMatrix) = ishermitian(A) && isposdef!(A, :U)
. This should be done to force the user to recognize that the test is performed on either the upper or lower part of the matrix. The current situation is that isposdef will just return false because only very few arrays containing floats can pass the ishermitian before the LAPACK.potrf! is called.

@andreasnoack
Copy link
Member

@stevengj Isn't this similar to having tolerances in rank, pinv and our least squares solver? I think that it would be good to analyze the impact of the choice of metric here, but I think that a slightly wrong tolerance is better than zero which results in issym returning false for almost all symmetric matrices that are output from linear algebra functions.

@dhoegh
Copy link
Contributor Author

dhoegh commented Mar 18, 2015

Bump what is the way forward, should I take the implementation of
isposdef!(A::HermOrSym) and isposdef{T}(A::HermOrSym{T}) to another pull?

From my point of view it's seems misleading to have implemented an isposdef!(A::StridedMatrix) when it will return false in all practical cases and hence lead the user to think the matrix is not positive definite.

@dhoegh
Copy link
Contributor Author

dhoegh commented Mar 18, 2015

Btw Matlab do not supply an isposdef function. We could just remove the high level isposdef{T}(A::AbstractMatrix{T}) and then only supply isposdef{T}(A::AbstractMatrix{T}, UL::Symbol) and hence make the user aware of the how it is tested.

@andreasnoack
Copy link
Member

No. Let's continue along this way. I agree with you and I think it is the correct thing to do. The old behavior will be a special case of the new with tol=0.0, so if someone rely on that they can have it.

The problem is to find the right default value for the tolerance and here it would be great to analyse the problem a bit. Some people reading this might find such analyses very easy so it would be great to have a reference, some plots and/or some math arguments for the choice of default tolerance.

@toivoh
Copy link
Contributor

toivoh commented Mar 18, 2015

I'm used to the point of view that A is positive definite iff x'*A*x > 0 for all nonzero x, ie if A+A' is positive definite. I think that it would make more sense to either use this definition (decoupling the property from whether the matrix is Hermitian) or to throw an error if it is not (approximately, at least) Hermitian.

@andreasnoack
Copy link
Member

It's a matter of culture. In my first year math exam I'd get an error if I concluded that a non-symmetric matrix was positive definite. So far we have followed that tradition here.

I guess there is a computational argument. You can check for my kind of definiteness with the Cholesky factorization whereas you'd have to calculate the eigen decomposition to check for your kind of definiteness, right?

Even if we decide to use the other definition of definiteness, then I think it is too restrictive to require exact match in issym and ishermitian for floating point values.

@toivoh
Copy link
Contributor

toivoh commented Mar 18, 2015

For my kind of definiteness, you have to symmetrize the matrix before calling Cholesky, which is an additional step, of course. I still don't like that you could be mislead to believe that your matrix was not positive definite because it was not Hermitian enough, even if the Hermitian part was positive definite.

@dhoegh dhoegh force-pushed the issym_ishermitian branch 2 times, most recently from 58f16ae to 753f81f Compare March 18, 2015 20:32
@andreasnoack
Copy link
Member

I think that it would make more sense to either use this definition (decoupling the [positive definiteness] property from whether the matrix is Hermitian)

According to my favorite matrix analysis book Hermitianity is actually a consequence of x'Ax>0 for all x. There is more room in the real case where x'A'x>0 only restricts the symmetric part. My favorite matrix analysis book also agrees with my math teacher on the positive definiteness of real matrices, but that is of course a definition, not a theorem. I've heard that these non-symmetric x'Ax>0 matrices play a role in physics, but I don't know how common they are. As long as we clearly document our definition I think we'd be okay.

@stevengj
Copy link
Member

@toivoh, if A is not Hermitian, then x'*A*x is not generally real and can't be compared with zero at all. For complex fields, I think what you want is the condition Re[x'*A*x] > 0 for nonzero x, which indeed is equivalent to A' + A positive-definite (i.e. the Hermitian part of A is positive-definite). I agree that this is probably the best generalization of definiteness to non-Hermitian matrices, and I use this definition myself in various contexts (e.g. appendix C of this paper uses it in an operator context, where for the non-Hermitian Maxwell operator the condition of definiteness (of the Hermitian part) is equivalent to passivity, i.e. non-negative dissipation).

But I think that most people would currently think of it as a generalization of the positive-definite concept, not positive-definiteness per se. If anyone wants to check for this they can always call isposdef(A + A'), which is pretty cheap.

@toivoh
Copy link
Contributor

toivoh commented Mar 19, 2015

Yes, that's true.

@toivoh
Copy link
Contributor

toivoh commented Mar 19, 2015

So my point was really that it seems it could be quite confusing to people to if isposdef(A) returns false because A is not Hermitian. Most people seem to think (I believe) either that the question is not even meaningful unless A is Hermitian, or that it has nothing to do with whether it is Hermitian.

@stevengj
Copy link
Member

It wouldn't be crazy for isposdef(A) to be equivalent to isposdef(A+A'). We have to return some answer for non-Hermitian matrices, and @toivoh has a point that false is the wrong answer if you think that the question isn't even defined (since x'*A*x is not even real for complex non-Hermitian A). So, from that perspective, we can either throw an exception or return the "generalized" answer.

@stevengj
Copy link
Member

@andreasnoack, you wrote According to my favorite matrix analysis book Hermitianity is actually a consequence of x'Ax>0 for all x. This statement is false, even for real matrices A where x'Ax is real (for real x), so you must be mis-reading/remembering your book. Oh, never mind, I see that you were referring to the complex case (where Hermiticity is a consequence of x'Ax being real, not specifically of it being positive).

Consider the matrix A = [2.4 2.9 0.4; 0 0.9 0.2; -0.1 -0.4 2.2]. This is not only non-symmetric, but it is also not similar to any Hermitian matrix (i.e. in any basis) because it has a pair of complex eigenvalues. However, A + A' is positive definite (or equivalently the real parts of eigvals(A) are > 0), and it follows that x'*A*x > 0 for all real x ≠ 0 (since x'*A*x = (x'*A*x)' = x'*A'*x, so x'*A*x = x'*(A+A')*x/2).

@andreasnoack
Copy link
Member

I'm fine we either of the two solutions. When we have isposdef for Symmetric and Hermitian where is a way to by pass the extra costs of symmetrizing the matrix.

@andreasnoack
Copy link
Member

@stevengj I wrote: According to my favorite matrix analysis book Hermitianity is actually a consequence of x'Ax>0 for all x. There is more room in the real case where x'Ax>0 only restricts the symmetric part. so you are not providing a counter example because you have restricted x to be real which was not part of the definition.

@andreasnoack
Copy link
Member

This came up again in #13753 so I think we should finalize this pr. It would be good if we could find some theory on this, but it seems like eps()*norm(A,1) would be a reasonable default value. For a range of condition numbers and norms of A, it gives issymmetric(Q*Λ*Q') => true. @dhoegh sorry that this stalled. Would you mind updating this pr?

cc: @bnels

@poulson
Copy link

poulson commented Oct 25, 2015

With respect to the discussion on the definition of "Hermitian", if we're talking about matrices and not (possibly infinite-dimensional) linear operators, I think that the important property is that there exists a unitary spectral decomposition with real eigenvalues, and, of course, equivalently, A is equal to its adjoint.

But I think that the discussion about x' A x > 0 is dangerous because the appropriate generalization in the presence of small errors is that x' A x lies close to the positive real line. It seems that several of the tests currently assume that the imaginary component of the diagonal is exactly zero, when there should instead be a test for the imaginary part being sufficiently small.

@dhoegh dhoegh force-pushed the issym_ishermitian branch from 753f81f to b45d5f4 Compare October 25, 2015 19:04
@dhoegh dhoegh force-pushed the issym_ishermitian branch from b45d5f4 to 80b7c20 Compare October 25, 2015 19:22
@dhoegh
Copy link
Contributor Author

dhoegh commented Oct 25, 2015

I have rebased the PR, added tol=eps()*norm(A,1) and squashed the commits.

@KristofferC
Copy link
Member

I guess we need to add the same functionality on the sparse side?

@@ -7211,9 +7211,9 @@ Given `@osx? a : b`, do `a` on OS X and `b` elsewhere. See documentation for Han
:@osx

doc"""
ishermitian(A) -> Bool
ishermitian(A; tol=eps(eltype(real(A)))*norm(A,1)) -> Bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean real(eltype(A)). Doing real(A) makes a real copy of the matrix if it is complex.

@stevengj
Copy link
Member

The default tol should certainly be proportional to eps * somenorm, but it's not really clear to me what the norm or the proportionality constant should be. It concerns me that no real error analysis (or even numerical experimentation) has been done in this PR.

Let's consider the application of this to eigvals(A), where we might use this check to decide whether to use the more-efficient eigensolvers available for Hermitian matrices. Consider A=H+δB, where H is Hermitian and δB is some small skew-Hermitian perturbation. The change δλ in an eigenvalue λ of H due to δΒ is given (to first order in δΒ) by x' δB x / (x'x) where x is the eigenvector, via perturbation theory. Suppose that we want the fractional error δλ/λ = x' δB x / (x' H x) to be ≤ ε where ε is some constant (probably proportional to the machine precision). We can bound x' δB x / (x' H x) ≤ norm(δB,2)*norm(inv(H),2), but this is unbounded if H is singular. If instead we want to have δλ/|λₘ| ≤ ε, where λₘ = norm(H,2) is the eigenvalue of H with the largest magnitude, then it is sufficient to require norm(δΒ,2) ≤ ε norm(H,2). Note however, that this analysis requires the error bound to be on the induced 2-norm, not the 1-norm. (The induced 2-norm is bounded by the induced 1-norm times a constant that depends on the matrix size, but I don't know offhand what that constant is.)

The other main use of checking whether a matrix is nearly Hermitian is to decide whether to use the Cholesky factorization for solving A \ b. There, note that inv(A) = inv(H) - inv(H) * δB * inv(H) to first order in δB. Hence, the fractional error norm(A \ x - H \ x) / norm(H \ x) is norm(inv(H) δB H \ x) / norm(H \ x) ≤ norm(inv(H)) norm(δB). So, in this case if we want the Cholesky factorization solution in terms of H to be accurate to relative error ε then it is sufficient to require norm(inv(H)) norm(δB) ≤ ε (in whatever norm you want the solution to be accurate). So, here, you can use the 1-norm if you want, but you need to compare the norm(δB) of the error to ε/norm(inv(A)), not ε*norm(A). But maybe that error tolerance is too strict — for badly conditioned problems, maybe we can accept a less-accurate solution...if we multiply the acceptable ε by a relative condition number of Α = norm(A)*norm(inv(A)), then we get back to norm(δB) ≤ ε norm(A).

So, it seems like answering the question of whether a matrix is "close enough" to Hermitian may depend on the application.

@poulson
Copy link

poulson commented Oct 26, 2015

@stevengj Hermitian eigenvalue solvers typically only provide high absolute accuracy, not high relative accuracy, so I think that this is asking too much. The backwards error of a Hermitian eigensolver is typically of the order of epsilon times the two-norm (see, for example, http://www.netlib.org/lapack/lawnspdf/lawn84.pdf). I therefore think that a deviation of order epsilon times the two-norm from Hermitian is acceptable for the projection. Similar approximations are made to deflate the QR iteration for upper Hessenberg matrices.

With that said, it is sometimes possible to achieve high relative accuracy, so perhaps there should be a flag in the eigensolver to make such an attempt and such approximations could be avoided (but, with the exception of bidiagonal SVD and symmetric tridiagonal EVP, I think custom software would need to be written).

EDIT: Sorry for misreading your statement; I didn't realize you were using lambda as the largest eigenvalue.

@stevengj
Copy link
Member

@poulson, saying that |δλ| = Ο(ε)*norm(A,2) is equivalent to saying that the relative error δλ/|λₘ| = O(ε), where |λₘ| is the maximum eigenvalue magnitude = norm(A,2) for Hermitian matrices. So, as I said, if you don't want to lose this property then you need norm(δB) = O(ε) norm(A) in the induced 2-norm, where δB is the anti-symmetric part of A.

You can convert this into a statement about the 1-norm with appropriate constant factors, but I'm not sure whether the current test norm(δB,1) ≤ ε norm(A,1) is exactly what we want. We currently seem to be pulling the constant factor of 1 out of thin air.

And, as I said, the application to A \ x is related but not quite identical.

I'd like to see some (a) more careful analysis and (b) some numerical experiments of the accuracy impact of this PR. Or at least (b).

@poulson
Copy link

poulson commented Oct 26, 2015

@stevengj I completely agree that the tolerance should be different for solving a linear system; my comment was only meant to relate to eigenvalue problems.

And I agree that we should not use the constant of one; || A ||_1 / sqrt(m) <= || A ||_2 <= sqrt(n) * || A ||_1, so we should scale by the square-root of the matrix size for eigenvalue problems rather than by one.

I would lean towards using tighter tolerances for linear systems since the relative cost of solving a symmetric versus a non symmetric linear system is small compared with the cost of a Hermitian eigenvalue problem relative to a Schur decomposition.

EDIT: The font works in another browser.

@stevengj
Copy link
Member

Probably your font is missing some glyphs for Unicode codepoints. (I'm switching back and forth between English and Greek keyboard inputs, so the codepoints might be unusual.)

@poulson
Copy link

poulson commented Oct 26, 2015

In summary, I would recommend that we try to preserve high absolute accuracy in projections to more structured forms for eigenvalue problems (until Julia supports algorithms which are likely to yield high relative accuracy), and for projections to preserve high relative accuracy when solving linear systems. As @stevengj said, the latter requires that the perturbations be on the order of epsilon times the smallest singular value rather than epsilon times the largest (consider a right-hand side living in the direction of the largest right singular vector and the perturbation of the matrix effecting its column space in the direction of the smallest right singular vector). Since knowing the smallest singular value is too expensive, we would need to decide on a condition estimator, but those typically required an initial (in general, LU) factorization...

@poulson
Copy link

poulson commented Oct 26, 2015

On the other hand, we could always be aggressive and attempt to solve the projected linear system using a heuristic projection tolerance, check the residual after two steps of iterative refinement (with the original matrix), and, if it is too large, fall back to the more expensive factorization of the original matrix. The caveat to this approach is the extra memory requirement, though it is no more than typically required by iterative refinement, and potentially performing an extra structured/symmetric factorization when an unstructured/unsymmetric factorization was required.

@stevengj
Copy link
Member

@poulson, what you're calling "absolute" error in the eigenvalue is still a essentially a relative error δλ / (max |λ|) = O(ε). (A "small absolute error" condition would be something like δλ ≤ 1e-8, independent of A, which is impossible to enforce. Absolute errors are dimensionful.)

Also, as I said, requiring small relative error |δx|/|x|=O(ε) in the solution x = A \ b is probably too strict, because the forward error |δx|/|x| is O(ε)κ(A) from roundoff errors anyway, where κ(A) is the relative condition number. If we require only |δx|/|x| = O(ε)*κ(A), then I think we get same condition |δB| = |A| O(ε) as for the eigenproblem.

So |δB| = |A| O(ε) seems right. The only question in my mind is what coefficient to use in the O(ε) for a chosen norm.

@andreasnoack andreasnoack added the linear algebra Linear algebra label Oct 26, 2015
@stevengj
Copy link
Member

In particular, the "constant" coefficient may depend on the matrix size, because the relationships between the norms depends on the matrix size.

The error analysis (particularly for the eigenvalues) is most natural in the induced-2 norm. Suppose we want norm(δB,2) ≤ # ε norm(A,2) for some size-independent constant #. But this norm is too expensive to compute, so suppose we want to use the induced 1-norm instead. If I remember correctly (please check before using!), norm(A,2) / sqrt(m) ≤ norm(A,1) ≤ norm(A,2) sqrt(m) for an m×m matrix. In this case, we would get the condition norm(δB,1) ≤ # ε norm(A,1) / m (although this is likely to be overly pessimistic for random δB).

@poulson
Copy link

poulson commented Oct 26, 2015

I agree. So we want the perturbation to satisfy norm(E,2) <= norm(A,1) / sqrt(m) <= eps norm(A,2). With that said, I think it is fairly standard to use the two-norm as the implicit unit when discussing the size of the "absolute error" in the eigenvalues.

At this point, what I am most more worried about is that I might have pushed us towards a confrontational tone. That was unintentional and perhaps the result of me firing off these responses while exhausted.

With that said, I completely agree that, by definition, the worst-case relative error of the solution from a stable linear solver is O(eps*cond(A)), but I think we can avoid always forcing ourselves into the worst-case regime by recognizing that, for example, it can be better to look at the condition number after a diagonal scaling (e.g., http://www.netlib.org/lapack/lawnspdf/lawn14.pdf).

One example showing a large difference in the relative residual after such a perturbation (that is recovered via iterative refinement with the unperturbed system) is

n=50;
A=randn(n,n); A(n,n)=1e13;
E=randn(n,n); E = E*(eps*norm(A))/norm(E);
APert=A+E;
[U,S,V]=svd(A);

xTrue=V(:,end);
b=A*xTrue;
x = A \ b;
xPert = APert \ b;

norm(A*x-b)/norm(b)
norm(A*xPert-b)/norm(b)

norm(x-xTrue)/norm(x)
norm(xPert-xTrue)/norm(x)

% Now perform iterative refinement to try to recover the perturbed solution
for j=1:5,
  xPert = xPert + APert\(b-A*xPert);
  norm(xPert-xTrue)/norm(xTrue)
end

outputs (in my instance)

ans =    8.7315e-14
ans =  0.033573
ans =    6.0674e-15
ans =  0.0012106
ans =    1.1314e-06
ans =    5.6175e-09
ans =    1.4628e-11
ans =    2.6568e-14
ans =    1.7758e-15

Another example that doesn't use singular vectors but demonstrates the relative error in the solution of an HPD matrix follows the condition number of the equilibrated matrix more closely than the condition number of the original matrix is the following, which would actually be a typical use-case. In particular, I construct a random matrix which is ill-conditioned (due to diagonal scaling) but close to HPD in the sense that its skew-Hermitian component has a two-norm which is on the order of epsilon times the two-norm of AScale, but the relative error in the true solution drops from O(1e-8) to O(1e-2) when making use of the nearby Hermitian matrix to solve the linear system.

n=100;
[Q,R]=qr(randn(n,n)); d=rand(n,1); B = Q*diag(d)*Q'; B=0.5*(B+B)';
scale=linspace(1,10^7,n);
BScale=diag(scale)*B*diag(scale); BScale=0.5*(BScale+BScale');

E = randn(n,n); E=E*(eps*norm(BScale))/norm(E);
AScale = BScale+E;
AScalePert = 0.5*(AScale+AScale');
norm(AScale-AScalePert)/norm(AScale)

xTrue = randn(n,n);
b = AScale*xTrue;

x = AScale \ b;
xPert = AScalePert \ b;

norm(x-xTrue)/norm(xTrue)
norm(xPert-xTrue)/norm(xTrue)

% Now try to iteratively refine the solution with the perturbed factorization
for j=1:3
   xPert = xPert + AScalePert\(b-AScale*xPert);
   norm(xPert-xTrue)/norm(xTrue)
end

which has output

ans =    1.5641e-16
ans =    3.7153e-08
ans =  0.011519
ans =    1.7495e-04
ans =    2.9398e-06
ans =    5.0337e-08

which shows that iterative refinement using the original system recovers the same accuracy after three steps (using higher-precision for the IR would be better).

So perhaps we need to consider diagonal equilibrations as well (or at least iteratively refine using the unprojected original system).

I apologize for all of the edits, as I had to teach two classes this morning.

@stevengj
Copy link
Member

@poulson, I think the norm equivalence works both ways. And you have to apply the norm bounds to both the left- and right-hand sides of the inequality (in opposite directions), since we're using the 1-norm for both A and δB, hence the factor of m rather than sqrt(m). (Again, this is pessimistic for random perturbations, where the 1-norm rarely approaches the sqrt(m) bound.)

(For the eigenvalues, the Hermitian perturbation theory I used is derived via the usual Euclidean inner product, which is why you are forced to use the 2-norm. For A\b the choice is more arbitrary.)

However, there is also an m dependence in the O(ε) for roundoff errors, and if you include this (i.e. you allow the asymmetry errors to grow at the same rate with m as the roundoff errors) the situation should be improved somewhat, though I haven't worked out the dependence.

@stevengj
Copy link
Member

@poulson, norm(A*x-b)/norm(b) is not the forward error (the error in x), which is what my derivation worked with, and what the condition number directly applies to.

@poulson
Copy link

poulson commented Oct 26, 2015

You're right that my example was rushed; I've modified it to show the phenomenon I was referring to. As can be seen, the relative error in the unperturbed solution is correct to fifteen digits, but the relative error in the perturbed solution is only correct to three.

@dhoegh
Copy link
Contributor Author

dhoegh commented Oct 26, 2015

I must admit this discussion surpasses my linear algebra understanding and I cannot follow through with this PR. Please feel free to use the commit from here if it is useful. The tests fail because there is not check for m==n after line https://github.com/dhoegh/julia/blob/80b7c207348944cc53cbaa81685dc00b4fdee4d0/base/linalg/generic.jl#L390

@dhoegh dhoegh closed this Oct 26, 2015
@poulson
Copy link

poulson commented Oct 26, 2015

@dhoegh I would argue that the conclusion is that (after my flurry of edits) is that a tolerance of 1/sqrt(m) norm(A,1) eps should be a good default for matrices where the two-norm is too expensive to estimate, and a tolerance of norm(A,2) eps should be used for matrices where the two-norm estimate is cheap (e.g., diagonal matrices). I showed that there are cases where there is a substantial loss in accuracy in naively solving the perturbed system, but that the accuracy can be recovered by employing iterative refinement with the usual system.

@stevengj
Copy link
Member

@poulson, I think your conclusion is partially incorrect, because:

  • we're taking the 1-norm of both the perturbation and the original matrix, so you need to apply the norm inequality twice (to both sides of the inequality), and hence you get two factors of sqrt(m)
  • that bound is overly pessimistic because it fails to take into account the growth of floating-point roundoff errors with m
  • there's still the question of the precise coefficient in front of all this, whether it is 1 or something else; this is best determined by numerical experiments.

@poulson
Copy link

poulson commented Oct 26, 2015

Good point on sqrt(m) squared. On the subject of the growth with respect to m, Higham has a nice chapter in Accuracy and Stability of Numerical Algorithms on this, but I think it is best to treat it as a low-degree polynomial of m, despite the existence of pathological cases where element-growth is exponential. But, in the presence of iterative refinement, these constants become somewhat irrelevant. I think that all analysis here needs to carefully take into account the effects of IR.

@stevengj
Copy link
Member

@poulson, the exponential worst-case growth is for LU factorization on arbitrary matrices; it doesn't apply to Cholesky anyway.

@poulson
Copy link

poulson commented Oct 26, 2015

No, but it does apply to symmetric-indefinite matrices: http://epubs.siam.org/doi/pdf/10.1137/100801548

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear algebra Linear algebra
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ishermitian test's A[i,j] != ctranspose(A[j,i]) which causes problems with numerical accuracy
6 participants